253 research outputs found

    A Bernstein-Von Mises Theorem for discrete probability distributions

    Full text link
    We investigate the asymptotic normality of the posterior distribution in the discrete setting, when model dimension increases with sample size. We consider a probability mass function θ0\theta_0 on \mathbbm{N}\setminus \{0\} and a sequence of truncation levels (kn)n(k_n)_n satisfying kn3ninfiknθ0(i).k_n^3\leq n\inf_{i\leq k_n}\theta_0(i). Let θ^\hat{\theta} denote the maximum likelihood estimate of (θ0(i))ikn(\theta_0(i))_{i\leq k_n} and let Δn(θ0)\Delta_n(\theta_0) denote the knk_n-dimensional vector which ii-th coordinate is defined by \sqrt{n} (\hat{\theta}_n(i)-\theta_0(i)) for 1ikn.1\leq i\leq k_n. We check that under mild conditions on θ0\theta_0 and on the sequence of prior probabilities on the knk_n-dimensional simplices, after centering and rescaling, the variation distance between the posterior distribution recentered around θ^n\hat{\theta}_n and rescaled by n\sqrt{n} and the knk_n-dimensional Gaussian distribution N(Δn(θ0),I1(θ0))\mathcal{N}(\Delta_n(\theta_0),I^{-1}(\theta_0)) converges in probability to 0.0. This theorem can be used to prove the asymptotic normality of Bayesian estimators of Shannon and R\'{e}nyi entropies. The proofs are based on concentration inequalities for centered and non-centered Chi-square (Pearson) statistics. The latter allow to establish posterior concentration rates with respect to Fisher distance rather than with respect to the Hellinger distance as it is commonplace in non-parametric Bayesian statistics.Comment: Published in at http://dx.doi.org/10.1214/08-EJS262 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Learning with Biased Complementary Labels

    Full text link
    In this paper, we study the classification problem in which we have access to easily obtainable surrogate for true labels, namely complementary labels, which specify classes that observations do \textbf{not} belong to. Let YY and Yˉ\bar{Y} be the true and complementary labels, respectively. We first model the annotation of complementary labels via transition probabilities P(Yˉ=iY=j),ij{1,,c}P(\bar{Y}=i|Y=j), i\neq j\in\{1,\cdots,c\}, where cc is the number of classes. Previous methods implicitly assume that P(Yˉ=iY=j),ijP(\bar{Y}=i|Y=j), \forall i\neq j, are identical, which is not true in practice because humans are biased toward their own experience. For example, as shown in Figure 1, if an annotator is more familiar with monkeys than prairie dogs when providing complementary labels for meerkats, she is more likely to employ "monkey" as a complementary label. We therefore reason that the transition probabilities will be different. In this paper, we propose a framework that contributes three main innovations to learning with \textbf{biased} complementary labels: (1) It estimates transition probabilities with no bias. (2) It provides a general method to modify traditional loss functions and extends standard deep neural network classifiers to learn with biased complementary labels. (3) It theoretically ensures that the classifier learned with complementary labels converges to the optimal one learned with true labels. Comprehensive experiments on several benchmark datasets validate the superiority of our method to current state-of-the-art methods.Comment: ECCV 2018 Ora

    Structured Random Matrices

    Full text link
    Random matrix theory is a well-developed area of probability theory that has numerous connections with other areas of mathematics and its applications. Much of the literature in this area is concerned with matrices that possess many exact or approximate symmetries, such as matrices with i.i.d. entries, for which precise analytic results and limit theorems are available. Much less well understood are matrices that are endowed with an arbitrary structure, such as sparse Wigner matrices or matrices whose entries possess a given variance pattern. The challenge in investigating such structured random matrices is to understand how the given structure of the matrix is reflected in its spectral properties. This chapter reviews a number of recent results, methods, and open problems in this direction, with a particular emphasis on sharp spectral norm inequalities for Gaussian random matrices.Comment: 46 pages; to appear in IMA Volume "Discrete Structures: Analysis and Applications" (Springer

    PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers

    Get PDF
    The aim of this paper is to generalize the PAC-Bayesian theorems proved by Catoni in the classification setting to more general problems of statistical inference. We show how to control the deviations of the risk of randomized estimators. A particular attention is paid to randomized estimators drawn in a small neighborhood of classical estimators, whose study leads to control the risk of the latter. These results allow to bound the risk of very general estimation procedures, as well as to perform model selection

    Convex recovery of a structured signal from independent random linear measurements

    Get PDF
    This chapter develops a theoretical analysis of the convex programming method for recovering a structured signal from independent random linear measurements. This technique delivers bounds for the sampling complexity that are similar with recent results for standard Gaussian measurements, but the argument applies to a much wider class of measurement ensembles. To demonstrate the power of this approach, the paper presents a short analysis of phase retrieval by trace-norm minimization. The key technical tool is a framework, due to Mendelson and coauthors, for bounding a nonnegative empirical process.Comment: 18 pages, 1 figure. To appear in "Sampling Theory, a Renaissance." v2: minor corrections. v3: updated citations and increased emphasis on Mendelson's contribution

    Mirror Descent and Convex Optimization Problems With Non-Smooth Inequality Constraints

    Full text link
    We consider the problem of minimization of a convex function on a simple set with convex non-smooth inequality constraint and describe first-order methods to solve such problems in different situations: smooth or non-smooth objective function; convex or strongly convex objective and constraint; deterministic or randomized information about the objective and constraint. We hope that it is convenient for a reader to have all the methods for different settings in one place. Described methods are based on Mirror Descent algorithm and switching subgradient scheme. One of our focus is to propose, for the listed different settings, a Mirror Descent with adaptive stepsizes and adaptive stopping rule. This means that neither stepsize nor stopping rule require to know the Lipschitz constant of the objective or constraint. We also construct Mirror Descent for problems with objective function, which is not Lipschitz continuous, e.g. is a quadratic function. Besides that, we address the problem of recovering the solution of the dual problem

    Sparsity and Incoherence in Compressive Sampling

    Get PDF
    We consider the problem of reconstructing a sparse signal x0Rnx^0\in\R^n from a limited number of linear measurements. Given mm randomly selected samples of Ux0U x^0, where UU is an orthonormal matrix, we show that 1\ell_1 minimization recovers x0x^0 exactly when the number of measurements exceeds mConstμ2(U)Slogn, m\geq \mathrm{Const}\cdot\mu^2(U)\cdot S\cdot\log n, where SS is the number of nonzero components in x0x^0, and μ\mu is the largest entry in UU properly normalized: μ(U)=nmaxk,jUk,j\mu(U) = \sqrt{n} \cdot \max_{k,j} |U_{k,j}|. The smaller μ\mu, the fewer samples needed. The result holds for ``most'' sparse signals x0x^0 supported on a fixed (but arbitrary) set TT. Given TT, if the sign of x0x^0 for each nonzero entry on TT and the observed values of Ux0Ux^0 are drawn at random, the signal is recovered with overwhelming probability. Moreover, there is a sense in which this is nearly optimal since any method succeeding with the same probability would require just about this many samples
    corecore